Projekt Outlier/Novelty Detection:

Erkennung defekter Gitterlinien eines Beugungsgitters

In [1]:
###############################################################################
# blabla
# Sidney Göhler 544131
#### IKT (M)
# Special Engineering SoSe20
# Prof. Dr. Andreas Zeiser
###############################################################################
import pandas as pd
import numpy as np

from scipy.io import loadmat
from scipy.stats import kurtosis
from scipy.stats import skew

import time
from sys import getsizeof
import dill
from itertools import permutations, count #,izip

from pathlib import Path

import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.gridspec as gridspec
import matplotlib.image as mpimg


from sklearn.model_selection import train_test_split

from sklearn.mixture import GaussianMixture
from sklearn.neighbors import LocalOutlierFactor
from sklearn.cluster import KMeans
from sklearn.svm import OneClassSVM

from sklearn.metrics import classification_report, confusion_matrix, roc_curve, auc
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

from keras.layers import Input, Dense, Dropout, LSTM, Conv2D, MaxPooling2D, UpSampling2D
from keras import regularizers
from keras.losses import BinaryCrossentropy, mean_squared_error, KLDivergence# SparseCategoricalCrossentropy
from keras.models import Model
from keras.callbacks import ModelCheckpoint, EarlyStopping

from helper import plot_stats, clust_eval, eval_gmm, ae_classy, ae_reconstruct, ae_reconstruct_fullset, create_model, setup_ae_and_train

%load_ext autoreload
%autoreload 2
In [2]:
#dill.load_session('notebook_env.db')

MATLAB Matizen laden

Die Datensets befinden sich im MATLAB Format und werden zunachst geladen.

In [3]:
path = "D:/Documents/Uni/Special Engineering/Projektdaten/"

files = ["fftArray_defectLines.mat",
         "fftArray_goodLines.mat",
         "realSpace_defectLines.mat",
         "realSpace_goodLines.mat",
         "timeTable_defectLines.mat",
         "timeTable_goodLines.mat"]

keys =['fftFeat','fftFeat','realSpace','realSpace','timeTable','timeTable']


mat = []
data =[]

for f in files:
    mat.append(loadmat(path+f))

    
for i, m in enumerate(mat[:-2]):
    
    ds = m[keys[i]]
    
    print(getsizeof(ds),np.shape(ds),np.max(ds))
    
    data.append(pd.DataFrame(ds))
    
    #data.append(m[keys[i]])
#data = mat['fftFeat']
112 (111, 17089) 26.413447263145173
112 (109, 17236) 16.648241834407273
112 (223, 17089) 590.1683374744971
112 (218, 17236) 592.5773385229529

Datensets

In erster Linie sind die ersten vier Datensets interessant. Wobei alle Modelle sowohl über die realspacedaten die fftdaten, sowie eine Kombination aus beiden Datensets, evaluiert werden.

Real Space

In [4]:
series = []
title = ['defect','good']


series.append(data[2][0:218].T[0:17089].T)
series.append(data[3].T[0:17089].T)

for s in series:
    print(np.shape(s))
(218, 17089)
(218, 17089)

PCA on Realspace data

Aufgrund der extrem hohen Dimensionalistät bietet es sich an die Anzahl der Features mittel Pr zu reduzieren.

In [5]:
series_pca = []
pcas = []
pca = None



for t in series:
#
   # for i,_ in enumerate(t):
    t = t.T
    

    
    #l = np.size(t,1)/10

    pca = PCA(n_components=80)
    
    pc = pca.fit(t)
    pcas.append(pc)
    trans = pc.transform(t)

    series_pca.append(trans)

plot Real Space

In [6]:
X_train_defect = []
X_train_defect_pca = []
X_train_good = []
X_train_good_pca = []
#######
X_train_pca = []
pcas = []
pca = None

#######


for ind, s, sp, t, lab in zip(count(), series, series_pca, title, [0, 1]):
    
    color = cm.tab10(lab)
    
    #sp = sp.T
    print('##########################################'
          ,t,np.shape(s), np.shape(sp),'##########################################')
    
    fig = plt.figure(figsize=(15,7))
    ax = fig.add_subplot(111)
    ax.set_title(f'{t} full range')
    ax.set_ylim([585,593])
    ax.plot(s.T, c=color)
    plt.show()
    
    plot_stats(s, 0, color, True)
    
    fig = plt.figure(figsize=(15,7))
    ax1 = fig.add_subplot(111)
    ax1.set_title(f'{t} full range after principal component analysis')
    ax1.plot(sp.T, c=color)
    plt.show()
    
    
    for r in range(len(s.T)):
      
        se = np.array(s[r])
        se_transformed = np.array(sp[r])
        
        #print(np.shape(se))
        
        a,b,c,d = plot_stats(se, 0)

        
        se = np.r_[se, a, b, c, d]
        se_transformed = np.r_[se_transformed, a, b, c, d]
        
        #a,b,c,d = plot_stats(se_transformed, 0)
        
        #print(lab)
        
        if (lab==0):
            X_train_defect.append(se)
            X_train_defect_pca.append(se_transformed)
        else:
            X_train_good.append(se)
            X_train_good_pca.append(se_transformed)
########################################## defect (218, 17089) (17089, 80) ##########################################
########################################## good (218, 17089) (17089, 80) ##########################################

Daten aufsplitten

In [7]:
G = pd.DataFrame(X_train_good)
D = pd.DataFrame(X_train_defect)

Gp = pd.DataFrame(X_train_good_pca)
Dp = pd.DataFrame(X_train_defect_pca)

print(np.shape(G),np.shape(D),np.shape(Dp),np.shape(Gp))
(17089, 222) (17089, 222) (17089, 84) (17089, 84)
In [8]:
G_train, G_valid, D_train, D_valid = train_test_split(G, D, test_size=0.6666, shuffle = True)
G_test, G_valid, D_test, D_valid = train_test_split(G_valid, D_valid, test_size=0.5, shuffle = True)

print(np.shape(G_train),np.shape(G_valid),np.shape(G_test))
print(np.shape(D_train),np.shape(D_valid),np.shape(D_test))
(5697, 222) (5696, 222) (5696, 222)
(5697, 222) (5696, 222) (5696, 222)
In [9]:
G_trainp, G_validp, D_trainp, D_validp = train_test_split(Gp, Dp, test_size=0.6666, shuffle = True)
G_testp, G_validp, D_testp, D_validp = train_test_split(G_validp, D_validp, test_size=0.5, shuffle = True)

print(np.shape(G_trainp),np.shape(G_validp),np.shape(G_testp))
print(np.shape(D_trainp),np.shape(D_validp),np.shape(D_testp))
(5697, 84) (5696, 84) (5696, 84)
(5697, 84) (5696, 84) (5696, 84)
In [ ]:

FFT Spektrum

In [10]:
fft = []
title = ['defect','good']

#series.append(data[2].T[0:17089].T)
#series.append(data[3].T[0:17089].T)
#series.append(data[2][0:218].T[0:17089].T)
#series.append(data[3].T[0:17089].T)

fft.append(data[0][2:111].T[0:17089])
fft.append(data[1][0:109].T[0:17089])

for s in fft:
    print(np.shape(s))
(17089, 109)
(17089, 109)

PCA on FFT

In [12]:
fft_pca = []
pcas = []
pca = None



for t in fft:
#
   # for i,_ in enumerate(t):
    

    
    #l = np.size(t,1)/10

    pca = PCA(n_components=50)
    
    pc = pca.fit(t)
    pcas.append(pc)
    trans = pc.transform(t)
    print(np.shape(trans))
    fft_pca.append(trans)
(17089, 50)
(17089, 50)

plot

In [13]:
ticks = range(0,len(Dp),1000)
#print(type(data[:-3]),np.shape(data[:-3]))

X_train_fftdefect = []
X_train_fftdefect_pca = []
X_train_fftgood = []
X_train_fftgood_pca = []



for ind, s, sp, t, lab in zip(count(), fft, fft_pca, title, [0, 1]):

    print('##########################################'
          ,t,np.shape(s), np.shape(sp),'##########################################')
    ds = s.rename(columns={x:y for x,y in zip(s.columns,range(0,len(s.columns)))})
    ds = ds.reset_index(drop=True)  
    ds = ds.T / np.max(ds.T)
    
    df = ds  
    
        
    #sp = sp.T
    
    
    fig = plt.figure(figsize=(14,12))
    ax = fig.add_subplot(111)
    ax.set_title(f'{t} fft full range')
   # ax.set_xticklabels(ticks)
    mat = ax.matshow(df, cmap = cm.magma)
    fig.colorbar(mat, ax=ax, orientation='vertical')
    ax.set_aspect('auto')
    plt.show()
    
    plot_stats(s, 0, color, True)
    
    fig = plt.figure(figsize=(14,12))
    ax1 = fig.add_subplot(111)
    ax1.set_title(f'{t} fft full range after principal component analysis')
    mat = ax1.matshow(sp, cmap = cm.magma)
    fig.colorbar(mat, ax=ax1, orientation='vertical')
    ax1.set_aspect('auto')
    plt.show()
    
    #plot_stats(df, 0,'red', True)
    
    for r in range(len(df.T)):
        
        se = np.array(df[r])
        se_transformed = np.array(sp[r])
        
        #print(np.shape(se))
        
        a,b,c,d = plot_stats(se, 0)

        
        se = np.r_[se, a, b, c, d]
        se_transformed = np.r_[se_transformed, a, b, c, d]
        
        #a,b,c,d = plot_stats(se_transformed, 0)
        
        #print(lab)
        
        if (lab==0):
            X_train_fftdefect.append(se)
            X_train_fftdefect_pca.append(se_transformed)
        else:
            X_train_fftgood.append(se)
            X_train_fftgood_pca.append(se_transformed)

    
    
########################################## defect (17089, 109) (17089, 50) ##########################################
########################################## good (17089, 109) (17089, 50) ##########################################

split

In [30]:
### Daten aufsplitten
Gfft = pd.DataFrame(X_train_fftgood)
Dfft = pd.DataFrame(X_train_fftdefect)

Gpfft = pd.DataFrame(X_train_fftgood_pca)
Dpfft = pd.DataFrame(X_train_fftdefect_pca)

print(np.shape(Gfft),np.shape(Dfft),np.shape(Dpfft),np.shape(Gpfft))
(17089, 113) (17089, 113) (17089, 54) (17089, 54)
In [31]:
G_trainfft, G_validfft, D_trainfft, D_validfft = train_test_split(Gfft, Dfft, test_size=0.6666, shuffle = True)
G_testfft, G_validfft, D_testfft, D_validfft = train_test_split(G_validfft, D_validfft, test_size=0.5, shuffle = True)

print(np.shape(G_trainfft),np.shape(G_validfft),np.shape(G_testfft))
print(np.shape(D_trainfft),np.shape(D_validfft),np.shape(D_testfft))
(5697, 113) (5696, 113) (5696, 113)
(5697, 113) (5696, 113) (5696, 113)
In [32]:
G_trainpfft, G_validpfft, D_trainpfft, D_validpfft = train_test_split(Gpfft, Dpfft, test_size=0.6666, shuffle = True)
G_testpfft, G_validpfft, D_testpfft, D_validpfft = train_test_split(G_validpfft, D_validpfft, test_size=0.5, shuffle = True)

print(np.shape(G_trainpfft),np.shape(G_validpfft),np.shape(G_testpfft))
print(np.shape(D_trainpfft),np.shape(D_validpfft),np.shape(D_testpfft))
(5697, 54) (5696, 54) (5696, 54)
(5697, 54) (5696, 54) (5696, 54)

Anschließend wird ein kombiniertes Datenset aus einer Verkettung des FFT Spektrums und des realspace vorgenommen. Der Wertebereich wird Sakliert und die Dimensionen werden anschließend wieder durch PCA reduziert.

kombiniertes Datenset FFT Spektrum+realspace

In [17]:
comb = []
title = ['defect','good']

datafft= [data[0][2:111].T[0:17089], data[1][0:109].T[0:17089]]
datareal = [data[2][0:218].T[0:17089], data[3].T[0:17089]]

for ind, d in enumerate(datafft):
    datafft[ind] = StandardScaler().fit_transform(d)

for ind, d in enumerate(datareal):
    datareal[ind] = StandardScaler().fit_transform(d)

    
    
combi0 = np.c_[datafft[0], datareal[0]]
combi1 = np.c_[datafft[1], datareal[1]]


print(np.shape(combi0),np.shape(combi1))

comb.append(combi0)
comb.append(combi1)
(17089, 327) (17089, 327)

PCA on combined Dataset

In [19]:
comb_pca = []
pcas = []
pca = None



for t in comb:
#
   # for i,_ in enumerate(t):
    

    
    #l = np.size(t,1)/10

    pca = PCA(n_components=100)
    
    pc = pca.fit(t)
    pcas.append(pc)
    trans = pc.transform(t)
    print(np.shape(trans))
    comb_pca.append(trans)
(17089, 100)
(17089, 100)

plot

In [20]:
X_train_combdefect = []
X_train_combdefect_pca = []
X_train_combgood = []
X_train_combgood_pca = []


for ind, s, sp, t, lab in zip(count(), comb, comb_pca, title, [0, 1]):

    print(ind)
    df = s  
    dp = sp
    
    fig = plt.figure(figsize=(14,12))
    ax = fig.add_subplot(111)
    ax.set_title(f'{t} fft full range')
   # ax.set_xticklabels(ticks)
    mat = ax.matshow(df, cmap = cm.magma)
    fig.colorbar(mat, ax=ax, orientation='vertical')
    ax.set_aspect('auto')
    plt.show()
    
    plot_stats(s, 0, color, True)
    
    fig = plt.figure(figsize=(14,12))
    ax1 = fig.add_subplot(111)
    ax1.set_title(f'{t} fft full range after principal component analysis')
    mat = ax1.matshow(dp, cmap = cm.magma)
    fig.colorbar(mat, ax=ax1, orientation='vertical')
    ax1.set_aspect('auto')
    plt.show()
    
    
    for r in range(len(df)):
        
        se = np.array(df[r])
        se_transformed = np.array(sp[r])
        
        #print(np.shape(se))
        
        a,b,c,d = plot_stats(se, 0)

        
        se = np.r_[se, a, b, c, d]
        se_transformed = np.r_[se_transformed, a, b, c, d]
        
        #a,b,c,d = plot_stats(se_transformed, 0)
        
        #print(lab)
        
        if (lab==0):
            X_train_combdefect.append(se)
            X_train_combdefect_pca.append(se_transformed)
        else:
            X_train_combgood.append(se)
            X_train_combgood_pca.append(se_transformed)
0
1
In [21]:
Gcomb = pd.DataFrame(X_train_combgood)
Dcomb = pd.DataFrame(X_train_combdefect)

Gpcomb = pd.DataFrame(X_train_combgood_pca)
Dpcomb = pd.DataFrame(X_train_combdefect_pca)

print(np.shape(Gcomb),np.shape(Dcomb),np.shape(Dpcomb),np.shape(Gpcomb))
(17089, 331) (17089, 331) (17089, 104) (17089, 104)
In [22]:
G_traincomb, G_validcomb, D_traincomb, D_validcomb = train_test_split(Gcomb, Dcomb, test_size=0.6666, shuffle = True)
G_testcomb, G_validcomb, D_testcomb, D_validcomb = train_test_split(G_validcomb, D_validcomb, test_size=0.5, shuffle = True)

print(np.shape(G_traincomb),np.shape(G_validcomb),np.shape(G_testcomb))
print(np.shape(D_traincomb),np.shape(D_validcomb),np.shape(D_testcomb))
(5697, 331) (5696, 331) (5696, 331)
(5697, 331) (5696, 331) (5696, 331)
In [23]:
G_trainpcomb, G_validpcomb, D_trainpcomb, D_validpcomb = train_test_split(Gpcomb, Dpcomb, test_size=0.6666, shuffle = True)
G_testpcomb, G_validpcomb, D_testpcomb, D_validpcomb = train_test_split(G_validpcomb, D_validpcomb, test_size=0.5, shuffle = True)

print(np.shape(G_trainpcomb),np.shape(G_validpcomb),np.shape(G_testpcomb))
print(np.shape(D_trainpcomb),np.shape(D_validpcomb),np.shape(D_testpcomb))
(5697, 104) (5696, 104) (5696, 104)
(5697, 104) (5696, 104) (5696, 104)

GMM

Es werden verschiedene GMM erzeugt, wobei zunächst die optimale Anzahl der Gaußverteilungen ermittelt wird.

estimate number of components

In [24]:
X_train = [G_trainp, G_trainpfft, G_trainpcomb]

min_cluster = 1
max_cluster = 15

h, w = len(X_train), max_cluster;
bic = [[0 for x in range(w)] for y in range(h)] 


for ind, ds in enumerate(X_train):
    print('GM on ds',ind+1,'with shape',np.shape(ds),'for',max_cluster,'components')
    for i in range(min_cluster, max_cluster):
        
        gm = GaussianMixture(n_components=i,
                               covariance_type='full',
                               tol=0.0005, 
                               reg_covar=0.0005,  
                               max_iter=100,
                               n_init=10,
                               init_params='kmeans',
                               verbose=0)

        gm.fit(ds)

        bic[ind][i] = gm.bic(ds)
GM on ds 1 with shape (5697, 84) for 15 components
GM on ds 2 with shape (5697, 54) for 15 components
GM on ds 3 with shape (5697, 104) for 15 components
In [65]:
n_best =[]

for b in bic:
    fig = plt.figure(figsize=(10,8))
    #print(np.argmin(b))
    plt.plot(range(1, max_cluster+1), b)
    plt.scatter(range(1, max_cluster+1)[np.argmin(b)], np.min(b), color='r')
    plt.show()
    print('min',np.min(b),'at',range(1, max_cluster+1)[np.argmin(b)])
    n_best.append(np.min(b))
min -1643410.5132927685 at 8
min 0.0 at 1
min 0.0 at 1
In [ ]:
 

GMM on realspace

In [41]:
trainset = G_trainp

# estimate for y_train==1 (good lines)
print('fit on',np.shape(trainset))
gm_realspace = GaussianMixture(n_components=8,
                               covariance_type='diag',
                               tol=0.00005, 
                               reg_covar=0.000005,  
                               max_iter=100,
                               n_init=10,
                               init_params='kmeans',
                               verbose=0)

gm_realspace.fit(trainset)

print('GMM converged =',gm_realspace.converged_)
print('in',gm_realspace.n_iter_,'iters')
#models.append(gm_realspace)
fit on (5697, 84)
GMM converged = True
in 70 iters
In [42]:
print('evaluate realspace\n\n')

threshold = 99.994

y_pred_gmm, y_pred_gmm_good = eval_gmm(gm_realspace, Gp, G_trainp, G_testp, G_validp, Dp, threshold)
evaluate realspace


traindata: 1
testdata: 0
validationsdata: 0
######################
estimated on Good Lines DS: 1 


######################
estimated on Defect Lines DS: 7
In [ ]:
 

GMM on fft

In [43]:
trainset = G_trainpfft

# estimate for y_train==1 (good lines) G_trainpfft
print('fit on',np.shape(trainset))
gm_fft = GaussianMixture(n_components=4,
                               covariance_type='diag',
                               tol=0.005, 
                               reg_covar=0.005, 
                               max_iter=100,
                               n_init=10,
                               init_params='kmeans',
                               verbose=0)

gm_fft.fit(trainset)

print('GMM converged =',gm_fft.converged_)
print('in',gm_fft.n_iter_,'iters')
#models.append(gm_realspace)
fit on (5697, 54)
GMM converged = True
in 11 iters
In [44]:
print('evaluate fft\n\n')
threshold = 99.986

y_pred_gmmfft, y_pred_gmmfft_good = eval_gmm(gm_fft, Gpfft, G_trainpfft, G_testpfft, G_validpfft, Dpfft, threshold)
evaluate fft


traindata: 1
testdata: 0
validationsdata: 1
######################
estimated on Good Lines DS: 2 


######################
estimated on Defect Lines DS: 20
In [ ]:
 

GMM on combined

In [45]:
trainset = G_trainpcomb

# estimate for good lines
print('fit on',np.shape(trainset))
gm_comb = GaussianMixture(n_components=2,
                               covariance_type='diag',
                               tol=0.5, 
                               reg_covar=0.5, 
                               max_iter=100,
                               n_init=10,
                               init_params='kmeans',
                               verbose=0)

gm_comb.fit(trainset)

print('GMM converged =',gm_comb.converged_)
print('in',gm_comb.n_iter_,'iters')
#models.append(gm_realspace)
fit on (5697, 104)
GMM converged = True
in 2 iters
In [46]:
print('evaluate combined')
threshold = 99.999

y_pred_gmmcomb, y_pred_gmmcomb_good = eval_gmm(gm_comb, Gpcomb, G_trainpcomb, G_testpcomb, G_validpcomb, Dpcomb, threshold)
evaluate combined
traindata: 1
testdata: 0
validationsdata: 0
######################
estimated on Good Lines DS: 1 


######################
estimated on Defect Lines DS: 8
In [ ]:
 
In [47]:
unique, counts = np.unique(y_pred_gmm, return_counts=True)
uniquefft, countsfft = np.unique(y_pred_gmmfft, return_counts=True)
uniquecomb, countscomb = np.unique(y_pred_gmmcomb, return_counts=True)

uniquegood, countsgood = np.unique(y_pred_gmm_good, return_counts=True)
uniquefftgood, countsfftgood = np.unique(y_pred_gmmfft_good, return_counts=True)
uniquecombgood, countscombgood = np.unique(y_pred_gmmcomb_good, return_counts=True)


index = ['predict defect']

df_gmmdefect = pd.DataFrame({'realspace': counts[0],
                   'fft': countsfft[0],
                   'combined': countscomb[0]}, index=index)

df_gmmgood = pd.DataFrame({'realspace': countsgood[0],
                   'fft': countsfftgood[0],
                   'combined': countscombgood[0]}, index=index)

df_gmm = pd.concat([df_gmmdefect, df_gmmgood], axis=0, sort=False, keys=['true defect','true good'])

fig = plt.figure(figsize=(14,7))

ax = fig.add_subplot(111)
ax.set_title('GMM outlier prediction on defect lines')
df_gmm.plot.bar(rot=0, ax=ax, width = 0.4)
plt.show()
print(df_gmm.T)


#false negative label for viz
y_pred_gmm[y_pred_gmm_good==0] = 2
y_pred_gmmfft[y_pred_gmmfft_good==0] = 2
y_pred_gmmcomb[y_pred_gmmcomb_good==0] = 2
             true defect      true good
          predict defect predict defect
realspace              7              1
fft                   20              2
combined               8              1

Autoencoder

Es werden verschiedene Autoencoder mit dem trainingset angelernt und zunächst nur mit den goodlines validiert.

AE on realspace

In [ ]:
 
In [48]:
autoencoder, history = setup_ae_and_train(30, 100, 10, G_trainp.shape[1], G_trainp, G_validp)
    
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 84)]              0         
_________________________________________________________________
dense (Dense)                (None, 42)                3570      
_________________________________________________________________
dropout (Dropout)            (None, 42)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 21)                903       
_________________________________________________________________
dropout_1 (Dropout)          (None, 21)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 42)                924       
_________________________________________________________________
dense_3 (Dense)              (None, 84)                3612      
=================================================================
Total params: 9,009
Trainable params: 9,009
Non-trainable params: 0
_________________________________________________________________

saving model to D:/Documents/Uni/!code/git/python/proj/modelle/autoencoder_transposed_1597046268.4397676.h5

Epoch 1/30
57/57 [==============================] - 1s 15ms/step - loss: 2.6746 - accuracy: 0.6516 - val_loss: 1.4665 - val_accuracy: 0.9798
Epoch 2/30
57/57 [==============================] - 1s 10ms/step - loss: 0.9033 - accuracy: 0.9719 - val_loss: 0.1959 - val_accuracy: 0.9819
Epoch 3/30
57/57 [==============================] - 1s 9ms/step - loss: 0.3965 - accuracy: 0.9839 - val_loss: 0.1239 - val_accuracy: 0.9914
Epoch 4/30
57/57 [==============================] - 1s 9ms/step - loss: 0.3130 - accuracy: 0.9877 - val_loss: 0.1076 - val_accuracy: 0.9960
Epoch 5/30
57/57 [==============================] - 1s 10ms/step - loss: 0.2706 - accuracy: 0.9877 - val_loss: 0.0993 - val_accuracy: 0.9963
Epoch 6/30
57/57 [==============================] - 0s 9ms/step - loss: 0.2419 - accuracy: 0.9881 - val_loss: 0.1075 - val_accuracy: 0.9963
Epoch 7/30
57/57 [==============================] - 0s 9ms/step - loss: 0.2217 - accuracy: 0.9888 - val_loss: 0.0912 - val_accuracy: 0.9946
Epoch 8/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1958 - accuracy: 0.9886 - val_loss: 0.0940 - val_accuracy: 0.9946
Epoch 9/30
57/57 [==============================] - 0s 9ms/step - loss: 0.1938 - accuracy: 0.9891 - val_loss: 0.0790 - val_accuracy: 0.9935
Epoch 10/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1741 - accuracy: 0.9879 - val_loss: 0.0900 - val_accuracy: 0.9921
Epoch 11/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1686 - accuracy: 0.9886 - val_loss: 0.0779 - val_accuracy: 0.9942
Epoch 12/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1593 - accuracy: 0.9877 - val_loss: 0.0880 - val_accuracy: 0.9939
Epoch 13/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1538 - accuracy: 0.9889 - val_loss: 0.0815 - val_accuracy: 0.9928
Epoch 14/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1455 - accuracy: 0.9879 - val_loss: 0.0925 - val_accuracy: 0.9939
Epoch 15/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1411 - accuracy: 0.9889 - val_loss: 0.0896 - val_accuracy: 0.9951
Epoch 16/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1360 - accuracy: 0.9910 - val_loss: 0.0843 - val_accuracy: 0.9944
Epoch 17/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1304 - accuracy: 0.9868 - val_loss: 0.0726 - val_accuracy: 0.9940
Epoch 18/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1298 - accuracy: 0.9914 - val_loss: 0.0878 - val_accuracy: 0.9937
Epoch 19/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1209 - accuracy: 0.9896 - val_loss: 0.1052 - val_accuracy: 0.9889
Epoch 20/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1230 - accuracy: 0.9903 - val_loss: 0.0947 - val_accuracy: 0.9917
Epoch 21/30
57/57 [==============================] - 1s 9ms/step - loss: 0.1205 - accuracy: 0.9891 - val_loss: 0.1043 - val_accuracy: 0.9921
Epoch 22/30
57/57 [==============================] - 1s 9ms/step - loss: 0.1122 - accuracy: 0.9900 - val_loss: 0.1021 - val_accuracy: 0.9925
Epoch 23/30
57/57 [==============================] - 1s 9ms/step - loss: 0.1124 - accuracy: 0.9905 - val_loss: 0.0852 - val_accuracy: 0.9940
Epoch 24/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1093 - accuracy: 0.9889 - val_loss: 0.0907 - val_accuracy: 0.9882
Epoch 25/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1132 - accuracy: 0.9902 - val_loss: 0.0993 - val_accuracy: 0.9909
Epoch 26/30
57/57 [==============================] - 0s 9ms/step - loss: 0.1034 - accuracy: 0.9900 - val_loss: 0.1081 - val_accuracy: 0.9925
Epoch 27/30
57/57 [==============================] - 0s 8ms/step - loss: 0.1059 - accuracy: 0.9914 - val_loss: 0.0951 - val_accuracy: 0.9909
min val loss: 0.07258044183254242
In [ ]:
 
In [49]:
        
lab = ['train','test','validation']
title = ['1 good', '0 defect']
sets = [[G_trainp, G_testp, G_validp], [D_trainp, D_testp, D_validp]]

ae_reconstruct(lab, title, sets, autoencoder)
In [50]:
title = ['0 defect', '1 good']
sets = [Dp, Gp]
threshold, losses = ae_reconstruct_fullset(title, sets, autoencoder)
In [ ]:
 

AE on FFT

In [51]:
autoencoderfft, historyfft = setup_ae_and_train(30, 100, 10, Gpfft.shape[1], G_trainpfft, G_validpfft)
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 54)]              0         
_________________________________________________________________
dense_4 (Dense)              (None, 27)                1485      
_________________________________________________________________
dropout_2 (Dropout)          (None, 27)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 13)                364       
_________________________________________________________________
dropout_3 (Dropout)          (None, 13)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 27)                378       
_________________________________________________________________
dense_7 (Dense)              (None, 54)                1512      
=================================================================
Total params: 3,739
Trainable params: 3,739
Non-trainable params: 0
_________________________________________________________________

saving model to D:/Documents/Uni/!code/git/python/proj/modelle/autoencoder_transposed_1597046294.7889206.h5

Epoch 1/30
57/57 [==============================] - 1s 10ms/step - loss: 4.0016 - accuracy: 0.2247 - val_loss: 3.1257 - val_accuracy: 0.8453
Epoch 2/30
57/57 [==============================] - 0s 8ms/step - loss: 2.0694 - accuracy: 0.8359 - val_loss: 0.9911 - val_accuracy: 0.8559
Epoch 3/30
57/57 [==============================] - 0s 8ms/step - loss: 1.2730 - accuracy: 0.8659 - val_loss: 0.8537 - val_accuracy: 0.8559
Epoch 4/30
57/57 [==============================] - 0s 8ms/step - loss: 1.1486 - accuracy: 0.8664 - val_loss: 0.8022 - val_accuracy: 0.8559
Epoch 5/30
57/57 [==============================] - 0s 8ms/step - loss: 1.0702 - accuracy: 0.8664 - val_loss: 0.7472 - val_accuracy: 0.8559
Epoch 6/30
57/57 [==============================] - 0s 8ms/step - loss: 1.0056 - accuracy: 0.8662 - val_loss: 0.6838 - val_accuracy: 0.8559
Epoch 7/30
57/57 [==============================] - 0s 8ms/step - loss: 0.9399 - accuracy: 0.8676 - val_loss: 0.6476 - val_accuracy: 0.8559
Epoch 8/30
57/57 [==============================] - 0s 8ms/step - loss: 0.8738 - accuracy: 0.8745 - val_loss: 0.5904 - val_accuracy: 0.8840
Epoch 9/30
57/57 [==============================] - 0s 8ms/step - loss: 0.8239 - accuracy: 0.8785 - val_loss: 0.5899 - val_accuracy: 0.8945
Epoch 10/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7874 - accuracy: 0.8834 - val_loss: 0.5663 - val_accuracy: 0.8983
Epoch 11/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7565 - accuracy: 0.8875 - val_loss: 0.5293 - val_accuracy: 0.8957
Epoch 12/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7208 - accuracy: 0.8878 - val_loss: 0.5378 - val_accuracy: 0.9010
Epoch 13/30
57/57 [==============================] - 0s 7ms/step - loss: 0.7031 - accuracy: 0.8919 - val_loss: 0.5386 - val_accuracy: 0.9052
Epoch 14/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6895 - accuracy: 0.8954 - val_loss: 0.5075 - val_accuracy: 0.9094
Epoch 15/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6608 - accuracy: 0.8894 - val_loss: 0.5000 - val_accuracy: 0.9082
Epoch 16/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6457 - accuracy: 0.8980 - val_loss: 0.5149 - val_accuracy: 0.9149
Epoch 17/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6278 - accuracy: 0.8945 - val_loss: 0.4939 - val_accuracy: 0.9187
Epoch 18/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6215 - accuracy: 0.9001 - val_loss: 0.5105 - val_accuracy: 0.9250
Epoch 19/30
57/57 [==============================] - 0s 7ms/step - loss: 0.6078 - accuracy: 0.8971 - val_loss: 0.4988 - val_accuracy: 0.9277
Epoch 20/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5951 - accuracy: 0.9014 - val_loss: 0.5085 - val_accuracy: 0.9305
Epoch 21/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5939 - accuracy: 0.9017 - val_loss: 0.4860 - val_accuracy: 0.9247
Epoch 22/30
57/57 [==============================] - 1s 9ms/step - loss: 0.5869 - accuracy: 0.9050 - val_loss: 0.4749 - val_accuracy: 0.9300
Epoch 23/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5810 - accuracy: 0.9042 - val_loss: 0.5078 - val_accuracy: 0.9352
Epoch 24/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5668 - accuracy: 0.9087 - val_loss: 0.5007 - val_accuracy: 0.9373
Epoch 25/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5622 - accuracy: 0.9059 - val_loss: 0.5021 - val_accuracy: 0.9401
Epoch 26/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5553 - accuracy: 0.9029 - val_loss: 0.4969 - val_accuracy: 0.9401
Epoch 27/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5538 - accuracy: 0.9105 - val_loss: 0.4855 - val_accuracy: 0.9417
Epoch 28/30
57/57 [==============================] - 0s 7ms/step - loss: 0.5416 - accuracy: 0.9098 - val_loss: 0.4773 - val_accuracy: 0.9405
Epoch 29/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5325 - accuracy: 0.9107 - val_loss: 0.5122 - val_accuracy: 0.9433
Epoch 30/30
57/57 [==============================] - 0s 8ms/step - loss: 0.5346 - accuracy: 0.9131 - val_loss: 0.4813 - val_accuracy: 0.9429
min val loss: 0.4749375283718109
In [52]:
lab = ['train','test','validation']
title = ['1 good', '0 defect']
sets = [[G_trainpfft, G_testpfft, G_validpfft], [D_trainpfft, D_testpfft, D_validpfft]]
    
    
ae_reconstruct(lab, title, sets, autoencoderfft)
In [53]:
title = ['0 defect', '1 good']
sets = [Dpfft, Gpfft]
thresholdfft, lossesfft = ae_reconstruct_fullset(title, sets, autoencoderfft)
In [ ]:

AE on combined Dataset

In [54]:
autoencodercomb, historycomb = setup_ae_and_train(30, 100, 10, Gpcomb.shape[1], G_trainpcomb, G_validpcomb)
Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 104)]             0         
_________________________________________________________________
dense_8 (Dense)              (None, 52)                5460      
_________________________________________________________________
dropout_4 (Dropout)          (None, 52)                0         
_________________________________________________________________
dense_9 (Dense)              (None, 26)                1378      
_________________________________________________________________
dropout_5 (Dropout)          (None, 26)                0         
_________________________________________________________________
dense_10 (Dense)             (None, 52)                1404      
_________________________________________________________________
dense_11 (Dense)             (None, 104)               5512      
=================================================================
Total params: 13,754
Trainable params: 13,754
Non-trainable params: 0
_________________________________________________________________

saving model to D:/Documents/Uni/!code/git/python/proj/modelle/autoencoder_transposed_1597046313.842951.h5

Epoch 1/30
57/57 [==============================] - 1s 10ms/step - loss: 2.9351 - accuracy: 0.2017 - val_loss: 1.9368 - val_accuracy: 0.4447
Epoch 2/30
57/57 [==============================] - 0s 8ms/step - loss: 1.4181 - accuracy: 0.4372 - val_loss: 1.0315 - val_accuracy: 0.4703
Epoch 3/30
57/57 [==============================] - 0s 8ms/step - loss: 1.0995 - accuracy: 0.4750 - val_loss: 0.8405 - val_accuracy: 0.5083
Epoch 4/30
57/57 [==============================] - 0s 8ms/step - loss: 0.9843 - accuracy: 0.4969 - val_loss: 0.7719 - val_accuracy: 0.5500
Epoch 5/30
57/57 [==============================] - 0s 8ms/step - loss: 0.9191 - accuracy: 0.5319 - val_loss: 0.7198 - val_accuracy: 0.5941
Epoch 6/30
57/57 [==============================] - 0s 8ms/step - loss: 0.8688 - accuracy: 0.5619 - val_loss: 0.6835 - val_accuracy: 0.6505
Epoch 7/30
57/57 [==============================] - 0s 9ms/step - loss: 0.8456 - accuracy: 0.5838 - val_loss: 0.6500 - val_accuracy: 0.6922
Epoch 8/30
57/57 [==============================] - 0s 8ms/step - loss: 0.8022 - accuracy: 0.6005 - val_loss: 0.6158 - val_accuracy: 0.7091
Epoch 9/30
57/57 [==============================] - 0s 9ms/step - loss: 0.7827 - accuracy: 0.6203 - val_loss: 0.5893 - val_accuracy: 0.7231
Epoch 10/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7552 - accuracy: 0.6326 - val_loss: 0.5765 - val_accuracy: 0.7475
Epoch 11/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7435 - accuracy: 0.6344 - val_loss: 0.5651 - val_accuracy: 0.7491
Epoch 12/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7266 - accuracy: 0.6447 - val_loss: 0.5565 - val_accuracy: 0.7530
Epoch 13/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7142 - accuracy: 0.6509 - val_loss: 0.5530 - val_accuracy: 0.7604
Epoch 14/30
57/57 [==============================] - 0s 8ms/step - loss: 0.7088 - accuracy: 0.6516 - val_loss: 0.5358 - val_accuracy: 0.7598
Epoch 15/30
57/57 [==============================] - 1s 9ms/step - loss: 0.6979 - accuracy: 0.6554 - val_loss: 0.5340 - val_accuracy: 0.7658
Epoch 16/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6941 - accuracy: 0.6553 - val_loss: 0.5324 - val_accuracy: 0.7637
Epoch 17/30
57/57 [==============================] - 0s 9ms/step - loss: 0.6857 - accuracy: 0.6554 - val_loss: 0.5252 - val_accuracy: 0.7695
Epoch 18/30
57/57 [==============================] - 0s 9ms/step - loss: 0.6805 - accuracy: 0.6651 - val_loss: 0.5200 - val_accuracy: 0.7581
Epoch 19/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6663 - accuracy: 0.6623 - val_loss: 0.5292 - val_accuracy: 0.7626
Epoch 20/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6674 - accuracy: 0.6649 - val_loss: 0.5265 - val_accuracy: 0.7632
Epoch 21/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6549 - accuracy: 0.6698 - val_loss: 0.5288 - val_accuracy: 0.7672
Epoch 22/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6508 - accuracy: 0.6719 - val_loss: 0.5216 - val_accuracy: 0.7635
Epoch 23/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6492 - accuracy: 0.6700 - val_loss: 0.5251 - val_accuracy: 0.7526
Epoch 24/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6421 - accuracy: 0.6767 - val_loss: 0.5424 - val_accuracy: 0.7637
Epoch 25/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6412 - accuracy: 0.6651 - val_loss: 0.5218 - val_accuracy: 0.7647
Epoch 26/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6332 - accuracy: 0.6767 - val_loss: 0.5226 - val_accuracy: 0.7579
Epoch 27/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6278 - accuracy: 0.6689 - val_loss: 0.5278 - val_accuracy: 0.7623
Epoch 28/30
57/57 [==============================] - 0s 8ms/step - loss: 0.6249 - accuracy: 0.6761 - val_loss: 0.5353 - val_accuracy: 0.7575
min val loss: 0.5200090408325195
In [55]:
lab = ['train','test','validation']
title = ['1 good', '0 defect']
sets = [[G_trainpcomb, G_testpcomb, G_validpcomb], [D_trainpcomb, D_testpcomb, D_validpcomb]]

ae_reconstruct(lab, title, sets, autoencodercomb)
In [56]:
title = ['0 defect', '1 good']
sets = [Dpcomb, Gpcomb]
thresholdcomb, lossescomb = ae_reconstruct_fullset(title, sets, autoencodercomb)
In [ ]:
 

classprediciton with autoencoder reconstruction error

In [57]:
y_pred_ae = ae_classy(losses[0], threshold)
y_pred_aefft = ae_classy(lossesfft[0], thresholdfft)
y_pred_aecomb = ae_classy(lossescomb[0], thresholdcomb)

y_pred_ae_good = ae_classy(losses[1], threshold)
y_pred_aefft_good = ae_classy(lossesfft[1], thresholdfft)
y_pred_aecomb_good = ae_classy(lossescomb[1], thresholdcomb)

unique, counts = np.unique(y_pred_ae, return_counts=True)
uniquefft, countsfft = np.unique(y_pred_aefft, return_counts=True)
uniquecomb, countscomb = np.unique(y_pred_aecomb, return_counts=True)

uniquegood, countsgood = np.unique(y_pred_ae_good, return_counts=True)
uniquefftgood, countsfftgood = np.unique(y_pred_aefft_good, return_counts=True)
uniquecombgood, countscombgood = np.unique(y_pred_aecomb_good, return_counts=True)



index = ['predict defect']

df_aedefect = pd.DataFrame({'realspace': counts[0],
                   'fft': countsfft[0],
                   'combined': countscomb[0]}, index=index)

df_aegood = pd.DataFrame({'realspace': countsgood[0],
                   'fft': countsfftgood[0],
                   'combined': countscombgood[0]}, index=index)

df_ae = pd.concat([df_aedefect, df_aegood], axis=0, sort=False, keys=['true defect','true good'])


fig = plt.figure(figsize=(14,7))

ax = fig.add_subplot(111)
ax.set_title('AUtoencoder: outlier prediction on defect lines')
df_ae.plot.bar(rot=0, ax=ax, width = 0.4)
plt.show()
print(df_ae.T)


#false negative label for viz
y_pred_ae[y_pred_ae_good==0] = 2
y_pred_aefft[y_pred_aefft_good==0] = 2
y_pred_aecomb[y_pred_aecomb_good==0] = 2
             true defect      true good
          predict defect predict defect
realspace              6              1
fft                   41             12
combined              40              1

Der Autoencoder klasifiziert keine Datensätze als false negative da der Schwellwert am maximalen Fehler der good lines angelegt wird.

In [58]:
index = ['GMM','AE']

df = pd.concat([df_gmm, df_ae], axis=0, sort=False, keys=index)

fig = plt.figure(figsize=(14,7))

ax = fig.add_subplot(111)
ax.set_title('GMM vs Autoencoder: outlier prediction on defect lines')

df.plot.bar(ax=ax, width = 0.4)
plt.show()
print(df.T)
                     GMM                            AE               
             true defect      true good    true defect      true good
          predict defect predict defect predict defect predict defect
realspace              7              1              6              1
fft                   20              2             41             12
combined               8              1             40              1

Visualisierung des Datensets mit t-SNE

Die Datensets werden mithilfe des t-SNE auf 2 Dimensionen reduziert und visualisiert.

In [63]:
lab = ['0 defect realspace','0 defect fft','0 defect combined']
D_tsne = []
G_tsne = []
lr = 500

p = [25] 

for d, l in zip([Dp, Dpfft, Dpcomb],lab):
    for ind, i in enumerate(p):
        print(f'##############\n{ind+1}/{len(p)}: t-sne p={i} learning_rate={lr} for {np.shape(d)} {l}\n##############')
        tsne = TSNE(n_components=2,  perplexity=i, learning_rate=lr, verbose=0, angle=0.2, n_jobs=-1)
        Y = tsne.fit_transform(d)

        if l in '0 defect':
            D_tsne.append(Y)
        else:
            G_tsne.append(Y)
##############
1/1: t-sne p=25 learning_rate=500 for (17089, 84) 0 defect realspace
##############
##############
1/1: t-sne p=25 learning_rate=500 for (17089, 54) 0 defect fft
##############
##############
1/1: t-sne p=25 learning_rate=500 for (17089, 104) 0 defect combined
##############
In [67]:
index = ['AE','GMM']
lab = ['0 defect realspace','0 defect fft','0 defect combined']
y_preds_ae = [y_pred_ae, y_pred_aefft, y_pred_aecomb]
y_preds_gmm = [y_pred_gmm, y_pred_gmmfft,y_pred_gmmcomb]
y_preds=[y_preds_ae, y_preds_gmm]
cmaps = [cm.rainbow, cm.jet]
xrange = [[-55, -25],[15, 45],[-5, 25]]
yrange = [[-55, -25],[75, 105],[30, 60]]

for d, la, y_pred, cma, indx in zip([G_tsne, G_tsne], [lab, lab], y_preds, cmaps, index):
    
    for i, X, xra, yra, y, l in zip(count(), d, xrange, yrange, y_pred, la):
        print('############## plot',indx,np.shape(Y), l,' for p =',p,'####################################################################')
        fig = plt.figure(figsize=(14,14))
        
        ax3 = fig.add_subplot(111)
        ax3.set_title(f'||tsne reduced: perplexity={p} ||{indx} {l} lines||')
        scat3 = ax3.scatter(X[:,0], X[:,1], marker='.', c=y,  cmap=cma)
        plt.legend(*scat3.legend_elements(), loc="lower left",  title="Classes")
        plt.show()
        
        fig = plt.figure(figsize=(14,5))

            
        ax = fig.add_subplot(111)
        ax.set_title(f'Outlier Region')
        ax.set_xlim(xra)
        ax.set_ylim(yra)
        scat = ax.scatter(X[:,0], X[:,1], marker='.', c=y,  cmap=cma)
        plt.legend(*scat.legend_elements(), loc="lower left", title="Classes")

        
        plt.show()
############## plot AE (17089, 2) 0 defect realspace  for p = [25] ####################################################################
############## plot AE (17089, 2) 0 defect fft  for p = [25] ####################################################################
############## plot AE (17089, 2) 0 defect combined  for p = [25] ####################################################################
############## plot GMM (17089, 2) 0 defect realspace  for p = [25] ####################################################################
############## plot GMM (17089, 2) 0 defect fft  for p = [25] ####################################################################
############## plot GMM (17089, 2) 0 defect combined  for p = [25] ####################################################################
dill.dump_session('notebook_env.db')

Fazit

Abschließend ist zu sagen, dass die Ermittlung von Outliern über den Rekonstruktionsfehler gut funktioniert hat. Besonders geeignet ist hierfür scheinbar das kombinierte Datenset, wobei noch zu klären ist ob eine Skalierung der anderen Datensets, die Erkennungsrate eventuell verbessert. Insbesondere der realspace würde davon vermutlich profitieren, da es sich hierbei um eine Spannungsflanke handelt, welche über die Zeit, einen deutlich unterschiedlichen Wertebreich hat.

Der Threshold für den Autoencoder wurde etwas (1/5) unter dem maximalen loss der good lines angesetzt um teils mehr Outlier zu identifizieren, auf kosten von wenigen false positives.

Unter Anwendung verschiedener Gaussian Mixture Models wurden deutlich weniger Outlier identifiziert, bei einer ähnlichen false positive rate. Es handelt sich dabei auch um andere Datenpunkte, als die vom Autoencoder identifizierten. Die Outlier des Autoencodermodells liegen im mithilfe des t-SNE Algorithmus visualisierten eher am Rand, wohingegen die der Gaussian Mixture Models eher verteilt. Beim kombinierten Datenset befinden sie sich meist in einer Ecke.

Es wurden ausschließlich die mittels PCA reduzierten Datensets verwendet um den Einfluss des "Fluches der Dimensionen" zu reduzieren. Besonders Effektiv scheint dies beim kombinierten Datenset zu sein, da sich die Anzahl bei diesem Datenset dritteln lies und dennoch die meisten Outlier über den Rekonstruktionsfehler aufzeigte.

Beim kombinierten Datenset könnten noch andere Kombinationsverfahren untersucht werden. In diesem Projekt wurde der realspace hinter den fftspace gekettet.

Es können auch mehrere Analyseverfahren eingesetzt werden (F-test, Korrelationskoeffizient,..) um eine sinnvolle Vorauswahl an Features zu treffen, bevor das gesammte Datenset per PCA reduziert wird.

Da die Algorithmen ausschließlich mit den goodlines trainiert worden sind kann man auch von Novelty Detection sprechen. Ein bewährter Ansatz, wären unter Anderem One Class SVM, welche über ihre ermittelten Stützvektoren Grenzen zu Novelties setzen.

Ein weiterer Ansatz zur Outlier Detection wären Isolation Forests, welche im Prinzip ein vielzahl von stark begrenzten Decision Trees sind. Diesen Ansatz nennt man auch Ensemble Learning.

In [ ]: